Language identification using parallel sub-word recognition - an ergodic HMM equivalence
نویسندگان
چکیده
Recently, we have proposed a parallel sub-word recognition (PSWR) system for language identification (LID) in a framework similar to the parallel phone recognition (PPR) approach in the literature, but without requiring phonetic labeling of the speech data in any of the languages in the LID task. In this paper, we show the theoretical equivalence of PSWR and ergodicHMM (E-HMM) based LID. Here, the front-end sub-word recognizer (SWR) and back-end language model (LM) of each language in PSWR correspond to the states and state-transitions of the E-HMM in that language. This equivalence unifies the parallel phone (sub-word) recognition and ergodic-HMM approaches, which have been treated as two distinct frameworks in the LID literature so far, thus providing further insights into both these frameworks. On a 6-language LID task using the OGI-TS database, the E-HMM system achieves performances comparable to the PSWR system, offering clear experimental validation of their equivalence.
منابع مشابه
Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units
We propose a stochastic pronunciation model using an ergodic hidden Markov model (EHMM) of automatically derived acoustic sub-word units (SWU). The proposed EHMM discovers the pronunciation structure inherent in the acoustic training data of a word without any apriori phonetic transcriptions. The EHMM is an HMM of HMMs – its states are SWU HMMs and the state-transitions compose various possible...
متن کاملAccent identification
Foreign accent identification is a new challenging problem closely related to other relatively recent fields of the multilinguality area such as dialect identification and language identification. This paper describes an automatic identification system for English accents from 6 different European countries. The approach is basedon a parallel set of ergodic nets with context independent HMM uni...
متن کاملLanguage modeling using PLSA-based topic HMM
In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic t...
متن کاملLikelihood normalization using an ergodic HMM for continuous speech recognition
In recent speech recognition technology, the score of a hypothesis is often de ned on the basis of HMM likelihood. As is well known, however, direct use of the likelihood as a scoring function causes di cult problems especially when the length of a speech segment varies depending on the hypothesis as in word-spotting, and some kind of normalization is indispensable. In this paper, a new method ...
متن کاملErgodic hidden Markov models and polygrams for language modeling
In this paper we present two new techniques for language modeling in speech recognition. The rst technique is based on ergodic discrete density Hidden Markov Models (HMM) which can be applied to bigrams based on word categories. This statistical approach of the so-called Markov bigrams enables an eecient unsupervised learning procedure for the bigram probabilities with the well-known Baum-Welch...
متن کامل